Analysis of single-cell FACS and RNA seq data of brain cells

In this work-flow we would like to examine 4 different protoclos for astrocytes isolation.

First step: using metacell package in order to find different cell population from our SPID-seq data

Loading the library:

library("metacell")

To start using MetaCell, you first initialize a database. This is not much more than linking the package to directory that stores all your objects. In our case we will initialize the database to the saved_work directory:

wdir = "/home/labs/amit/diklag/thesis/astro/no_spinal_cord_new/"
setwd(wdir)
if(!dir.exists("saved_work")) dir.create("saved_work/")
scdb_init("saved_work/", force_reinit=T)
## initializing scdb to saved_work/
tgconfig::override_params("annotations/astro_params.yaml","metacell")
src_functions = "../sc_aux_functions.R"

force_reinit=T instruct the system to override existing database objects. This can be important if you are running in cycles and would like to update your objects. Otherwise, the database reuses loaded objects to save time on reading and initializing them from the disk.

tgconfig::override_params overrides default metacell configurations

Before starting to analyze the data, we link the package to a figure directory:

if(!dir.exists("results")) dir.create("results/")
scfigs_init("results/")

We will read multiple SPID umi matrices (umi.tab) and merge them, based on a table defining the datasets @param mat_nm defines the ID of the matrix object (and is going to be the name of all the objects from now on) @param base_dir defines the umitab directory @param mat_nm defines the name (id) of matrix @param datasets_table_fn defines the index file of the SPID multi batch dataset. This is a tab delimited text file, with an arbitrary number of columns and a header line. The three mandatory fields are: Amp.Batch.ID - specify the ID of the batch defined by the row, and also the file name (without the .txt suffix) of the respective umi table in the base_dir provided. Seq.Batch.ID - efines and ID of the sequencing batch (may be relevant for further noise cleanups beyond those done in the low-level pipeline). Batch.Set.ID - The third id group different batches into sets for downstream analysis (e.g. QC and more).

index_fn = "annotations/astro.txt"
id = "brain" 

Let us take a look at our index file:

Amp.Batch.ID Seq.Batch.ID Batch.Set.ID Protocol_version_ID Pool_barcode R2_design Experiment_ID Owner Species Empty_wells ng.ul length QC.Primer QC1..pooled.cDNA. QC2..FinalLib.
AB952 SB57 Quintana protocol 1 brain P1 SPID V0.9 ATTGCGAC 7W.8R Quintana protocol 1 brain Mor_K Mouse O1,O2,P1,P2 8.06 358 mActb 19.705 11.470
AB953 SB57 Quintana protocol 1 brain P2 SPID V0.9 GTCTCATC 7W.8R Quintana protocol 1 brain Mor_K Mouse O1,O2,P1,P2 12.00 400 mActb 19.865 10.740
AB954 SB57 Quintana protocol 2 brain P1 SPID V0.9 GCGTGACC 7W.8R Quintana protocol 2 brain Mor_K Mouse O1,O2,P1,P2 11.00 421 mActb 19.330 10.815
AB955 SB57 Quintana protocol 2 brain P2 SPID V0.9 ATCGGATC 7W.8R Quintana protocol 2 brain Mor_K Mouse O1,O2,P1,P2 10.10 405 mActb 19.520 10.860
AB956 SB57 Quintana protocol 3 brain P1 SPID V0.9 CTTGCACC 7W.8R Quintana protocol 3 brain Mor_K Mouse O1,O2,P1,P2 11.90 399 mActb 18.990 10.535
AB957 SB57 Quintana protocol 3 brain P2 SPID V0.9 ACTGGAAC 7W.8R Quintana protocol 3 brain Mor_K Mouse O1,O2,P1,P2 7.27 327 mActb 18.780 10.335

Let’s load a matrix to the system:

umi.tab_dir = "/home/labs/amit/eyald/sc_pipeline/scdb_v4_mouse/output/umi.tab/"
mcell_import_multi_mars(mat_nm = id, dataset_table_fn = index_fn, base_dir = umi.tab_dir, force = T)
## will read AB952
## will read AB953
## will read AB954
## will read AB955
## will read AB956
## will read AB957
## will read AB958
## will read AB959
## [1] TRUE
mat = scdb_mat(id)
print(dim(mat@mat))
## [1] 52634  3072

The scdb_mat() command returns a matrix object, which has one slot containing the count matrix - mat@mat, as well as additional features we will mention below.

MetaCell uses a standardized naming scheme for the figures, to make it easier to archive and link analysis figures to the database objects. In principle, figures in the figures directory are named after the object data type they refer to (for example, mat for matrices, mc for metacells, and more, see below). The figure name then includes also the object name they refer to, and a suffix describing the actual figure type.

Exploring and filtering the UMI matrix

To get a basic understanding of the new data, we will plot the distribution of UMI count per cell (the plot is thresholded after 50 bins):

mcell_plot_umis_per_cell(id,min_umis_cutoff = 100)
## [1] 100
Umi distribution plot

Umi distribution plot

We want to clean some known issues from the matrix before starting to work with it. We generate a list of mitochondrial genes that typically mark cells as being stressed or dying, as well as immunoglobulin genes that may represent strong clonal signatures in plasma cells, rather than cellular identity.

mat = scdb_mat(id)
nms = c(rownames(mat@mat), rownames(mat@ignore_gmat))
bad_genes = c(grep("ERCC", nms, v=T),
              grep("^Gm[0-9]", nms, v=T),
              grep(".*Rik$",nms,v=T),
              grep("^mt-", nms, v=T),
              grep("Malat1", nms, v=T))

We will next ask the package to ignore the above genes and ignore erithrocytes:

mcell_mat_ignore_genes(new_mat_id=id, mat_id=id, bad_genes, reverse=F) 

Ignored genes are kept in the matrix for reference, but all downstream analysis will disregard them. This means that the number of UMIs from these genes cannot be used to distinguish between cells.

In the current example we will also eliminate cells with less than 500 UMIs (threshold can be set based on examination of the UMI count distribution):

mcell_mat_ignore_small_cells(id, id, 100)

Note that filtering decisions can be iteratively modified given results of the downstream analysis.

Selecting feature genes

We move on to computing statistics on the distributions of each gene in the data, which are going to be our main tool for selecting feature genes for MetaCell analysis:

mcell_add_gene_stat(gstat_id=id, mat_id=id, force=T)
## Calculating gene statistics...
## will downsamp
## done downsamp
## will gen mat_n
## done gen mat_n
## done computing basic gstat, will compute trends
## ..done

This generates a new object of type gstat under the name ‘r id’, by analyzing the count matrix with id ‘r id’. We can explore interesting genes and their distributions:

gstat = scdb_gstat(id)
t_vm = quantile(gstat$ds_vm_norm,c(1:10)/10)[8]

We can also move directly to select a gene set for downstream analysis: We create a new object of type gset (gene set), to which all genes whose scaled variance (variance divided by mean) exceeds a given threshold are added:

mcell_gset_filter_multi(gstat_id=id, gset_id=id, T_tot=50, T_top3=4,  T_vm = t_vm, force_new = T)
## Selected 1026 markers

The command creates a new gene set with all genes for which the scaled variance is higher than 90% of the genes (top decile), it also restricts this gene set to genes with at least 50 UMIs across the entire dataset, and also requires selected genes to have at least three cells for more than 4 UMIs were recorded.

We update the gset object by removing a list of irrelevant genes which we don’t want them to affect the clustering process

gset_unfiltered = scdb_gset(id)
rp_markers = grep("Rpl|Rps|mt-", names(gset_unfiltered@gene_set), v=T)
markers_to_keep = setdiff(names(gset_unfiltered@gene_set), unique(c(bad_genes,rp_markers)))

scdb_del_gset(id)
## [1] TRUE
tmp = rep(1, length(markers_to_keep)); names(tmp) = markers_to_keep
new_gset = gset_new_gset(tmp, "ribos filtered out")
scdb_add_gset(id, new_gset)

We can refine our parameters by plotting all genes and our selected gene set given the mean and variance statistics:

mcell_plot_gstats(gstat_id=id, gset_id=id)
## png 
##   2
var mean plot

var mean plot

Building the balanced cell graph

Assuming we are happy with the selected genes, we will move forward to create a similarity graph (cgraph), using a construction called balanced K-nn graph:

set.seed(27)
mcell_add_cgraph_from_mat_bknn(mat_id=id,gset_id = id,graph_id=id,K=100,dsamp=T)
## will downsample the matrix, N= 750
## will build balanced knn graph on 2974 cells and 1025 genes, this can be a bit heavy for >20,000 cells

This adds to the database a new cgraph object named test_graph. The K=100 parameter is important, as it affects the size distribution of the derived metacells. Note that constructing the graph can become computationally intensive if going beyond 20-30,000 cells. The system is currently limited by memory, and we have generated a graph on 160,000 cells on machines with 0.5TB RAM..

Resampling and generating the co-clustering graph

The next step will use the cgraph to sample five hundred metacell partitions, each covering 75% of the cells and organizing them in dense subgraphs:

set.seed(27)
mcell_coclust_from_graph_resamp(coc_id=id,graph_id=id,min_mc_size=15,p_resamp=0.75, n_resamp=500)
## running bootstrap to generate cocluster
## 2%...6%...14%...20%...28%...37%...43%...51%...59%...65%...72%...78%...86%...94%...100%
## done resampling

The metacell size distribution of the resampled partitions will be largely determined by the K parameter used for computing the cgraph. The resampling process may take a while if the graphs are very large. You can modify n_resamp to generate fewer resamples.

The resampling procedure creates a new coclust object in the database named ‘r id’, and stores the number of times each pair of cells ended up being part of the same metacell. The co-clustering statistics are used to generate a new similarity graph, based on which accurate calling of the final set of metacells is done:

set.seed(27)
mcell_mc_from_coclust_balanced(coc_id=id,mat_id= id,mc_id= id,K=30, min_mc_size=15, alpha=2)
## filtered 659392 left with 188948 based on co-cluster imbalance
## building metacell object, #mc 36
## add batch counts
## compute footprints
## compute absolute ps
## compute coverage ps
## reordering metacells by hclust and most variable two markers
## reorder on C1qb vs Dbi

We created a metacell object’r id’ based on analysis of the co-clustering graph. The parameter K determines the number of neighbors we wish to minimally associate with each cell. Prior to partitioning the co-cluster graph is filtered to eliminate highly unbalanced edges, with smaller alpha resulting in harsher filtering.

Removing outlier cells

We now have a preliminary metacell object. It is a good practice to make sure all metacells within it are homogeneous. This is done by the outlier scan procedure, which splits metacells whose underlying similarity structure supports the existence of multiple sub-clusters, and removes outlier cells that strongly deviate from their metacell’s expression profile.

fc = 4
new_id = paste0(id,"_f")
mcell_mc_split_filt(new_mc_id=new_id, 
            mc_id=id, 
            mat_id=id,
            T_lfc=fc, plot_mats=F)
## starting split outliers
## splitting metacell 26
## splitting metacell 27
## add batch counts
## compute footprints
## compute absolute ps
## compute coverage ps

Creating heatmaps of metacells and genes

We will first assign random colors to our clusters (these can later be modified with custom color definitions, e.g. based on cell type assignments).

mc<- scdb_mc(new_id)
mc@colors <- colorRampPalette(c("darkgray", "burlywood1", "chocolate4","orange", "red", "purple", "blue","darkgoldenrod3", "cyan"))(ncol(mc@mc_fp))
no_color_mc = paste0("no_color_",new_id)
scdb_add_mc(id = no_color_mc,mc = mc)
scdb_add_mc(id = new_id,mc = mc)
mc<- scdb_mc(no_color_mc)

The filtered metacell object ‘r id’ can now be visualized. In order to do this effectively, we usually go through one or two iterations of selecting informative marker genes. The package can select markers for you automatically - by simply looking for genes that are strongly enriched in any of the metacells:

mcell_gset_from_mc_markers(gset_id=paste0(new_id,"_markers"), mc_id=no_color_mc)
mcell_mc_plot_marks(mc_id=no_color_mc, gset_id=paste0(new_id,"_markers"), mat_id=id,plot_cells = F)
heatmap_marks_mc

heatmap_marks_mc

Visualizing the MC confusion matrix in order to colorize the metacells

While 2D projections are popular and intuitive (albeit sometimes misleading) ways to visualize scRNA-seq results, we can also summarize the similarity structure among metacells using a “confusion matrix” which encodes the pairwise similarities between all metacells. This matrix may capture hierarchical structures or other complex organizations among metacells.

We first create a hierarchical clustering of metacells, based on the number of similarity relations between their cells:

mc_hc = mcell_mc_hclust_confu(mc_id=no_color_mc,graph_id=id)

Next, we generate clusters of metacells based on this hierarchy, and visualize the confusion matrix and these clusters. The confusion matrix is shown at the bottom, and the top panel encodes the cluster hierarchy (subtrees in blue, sibling subtrees in gray):

mc_sup = mcell_mc_hierarchy(mc_id=no_color_mc,mc_hc=mc_hc, T_gap=0.04)
save(file="saved_work/mc_hc_sup.Rda",mc_hc,mc_sup)
source(src_functions)
local_mcell_mc_plot_hierarchy(mc_id=no_color_mc,graph_id=id,mc_order=mc_hc$order,sup_mc = mc_sup,width=3500, height=3500, min_nmc=2)
## png 
##   2
confusion matrix

confusion matrix

source(src_functions)
mc =  scdb_mc(no_color_mc)
sup_col = data.frame(supid=c(),color=c(),name=c())
sup_col = rbind(sup_col,data.frame(supid=c(6),color=c("red"),name=c("Microglia"))) 
sup_col = rbind(sup_col,data.frame(supid=c(12),color=c("green"),name=c("Astrocytes")))
sup_col = rbind(sup_col,data.frame(supid=c(13),color=c("purple"),name=c("Oligodendrocytes"))) 
sup_col = rbind(sup_col,data.frame(supid=c(19),color=c("gray"),name=c("Others"))) 
sup_col = rbind(sup_col,data.frame(supid=c(14),color=c("brown"),name=c("Epithelial"))) 
sup_col = rbind(sup_col,data.frame(supid=c(18),color=c("navy"),name=c("Neutrophils"))) 
sup_col = rbind(sup_col,data.frame(supid=c(20),color=c("pink"),name=c("T cells"))) 

# 13    yellow  granulocytes
write.table(sup_col,file = "annotations/sup_col.txt",row.names = FALSE,sep = "\t")
gene_key = data.frame(name = c(), gene =c(), color = c(), T_fold = c())
gene_key = rbind(gene_key,data.frame(name = c("monocytes"), gene =c("H2-Aa"), color = c("blue"), T_fold = c(6)))
write.table(gene_key, file="annotations/gene_key.txt",row.names = FALSE,sep = "\t")
mc_colorize_sup_hierarchy(mc_id = new_id,supmc = mc_sup,supmc_key = "annotations/sup_col.txt",gene_key = "annotations/gene_key.txt")
local_mcell_mc_plot_hierarchy(mc_id=new_id,graph_id=id,mc_order=mc_hc$order,sup_mc = mc_sup,width=3500, height=3500, min_nmc=2)
## png 
##   2

After coloring, our confusion matrix looks like that: confusion matrix

Projecting metacells and cells in 2D

Heat maps are useful but sometimes hard to interprets, and so we may want to visualize the similarity structure among metacells (or among cells within metacells). To this end we construct a 2D projection of the metacells, and use it to plot the metacells and key similarities between them (shown as connecting edges), as well as the cells. This plot will use the same metacell coloring we established before (and in case we improve the coloring based on additional analysis, the plot can be regenerated):

set.seed(27)
source(src_functions)
mcell_mc2d_force_knn(mc2d_id= new_id,mc_id=new_id, graph_id=id)
## comp mc graph using the graph brain and K 20
tgconfig::set_param("mcell_mc2d_height",2200, "metacell")
tgconfig::set_param("mcell_mc2d_width",2200, "metacell")
local_mcell_mc2d_plot(mc2d_id=new_id,plot_edges = T)
## png 
##   2

Note that we changed the metacell parameters “mcell_mc2d_height/width” to get a reasonably-sized figure. There are many additional parameters that can be tuned in MetaCell, and more of those meant for routine tuning will be discussed in other vignettes. We obtain the following figure: proj2d mean plot

mcell_mc_plot_marks(mc_id=new_id, gset_id=paste0(new_id,"_markers"), mat_id=id,plot_cells = F)
heatmap_marks_mc

heatmap_marks_mc

# generates the excel file of clusters and genes
mcell_mc_export_tab(mc_id=new_id,gstat_id=id,mat_id=id,T_gene_tot=100, metadata_fields=c("amp_batch_id"))

Index sorting data

logicle functions (just copy and run all the code in the Chunk)
pretty10exp <-function (x, drop.1 = FALSE, digits.fuzz = 7)
{
  mi = min(x,na.rm = TRUE)
  eT <- floor(log10(abs(x)) + 10^-digits.fuzz)
  mT <- signif(x/10^eT, digits.fuzz)
  ss <- vector("list", length(x))
  for (i in seq(along = x)) ss[[i]] <- if (is.na(x[i]))
    quote(NA)
  else if (x[i] == 0)
    quote(0)
  else if (drop.1 && mT[i] == 1)
    substitute(10^E, list(E = eT[i]))
  else if (drop.1 && x[i] == mi)
    mi
  else if (drop.1 && mT[i] == -1)
    substitute(-10^E, list(E = eT[i]))
  #else substitute(A %*% 10^E, list(A = mT[i], E = eT[i]))
  else ""
  do.call("expression", ss)
}

flow_breaks <- function(x, n = 6, equal.space = FALSE, trans.fun, inverse.fun){
  rng.raw <- range(x, na.rm = TRUE)
  if(equal.space){
    
    rng <- trans.fun(rng.raw)
    min <- floor(rng[1])
    max <- ceiling(rng[2])
    if (max == min)
      return(inverse.fun(min))
    by <- (max - min)/(n-1)
    
    myBreaks <- inverse.fun(seq(min, max, by = by))
    
  }else{
    #log10 (e.g. 0, 10, 1000, ...)
    
    base10raw <- unlist(lapply(1:n,function(e)10^e))
    base10raw <- unlist(lapply(base10raw, function(x) x*c(1:9)))
    base10raw <- c(0,base10raw)
    base10raw_minus = c()
    if(rng.raw[1] < 0){
      base10raw_minus <- unlist(lapply(ceiling(log10(abs(rng.raw[1]))):1,function(e)-10^e))
      base10raw_minus <- unlist(lapply(base10raw_minus, function(x) x*c(9:1)))
    }
    myBreaks <- unique(c(base10raw_minus,base10raw[base10raw>rng.raw[1]&base10raw<rng.raw[2]]))
  }
  myBreaks
}


flow_trans <- function(name, trans.fun, inverse.fun, equal.space = FALSE, n = 6){
  
  brk <- function(x){
    flow_breaks(x, n = n, equal.space = equal.space, trans.fun, inverse.fun)
  }
  
  if(equal.space){
    fmt <- format_format(digits = 0)
  }else{
    fmt <- function(x){
      pretty10exp(as.numeric(x),drop.1=TRUE)
    }
    
  }
  
  trans_new(name, transform = trans.fun, inverse = inverse.fun, breaks = brk, format = fmt)
}

logicle_trans <- function(..., n = 6, equal.space = FALSE){
  trans.obj <- logicleTransform(...)
  trans <- trans.obj@.Data
  inv <- inverseLogicleTransform(trans = trans.obj)@.Data
  flow_trans(name = "logicle", trans.fun = trans, inverse.fun = inv, n = n, equal.space = equal.space)
}

scale_x_logicle <- function(..., w = 0.5, t = 262144, m = 4.5, a = 0){
  myTrans <- logicle_trans(w = w, t = t, m = m, a = a)
  scale_x_continuous(..., trans = myTrans)
  
}

scale_y_logicle <- function(..., w = 0.5, t = 262144, m = 4.5, a = 0){
  myTrans <- logicle_trans(w = w, t = t, m = m, a = a)
  scale_y_continuous(..., trans = myTrans)
}

In order to run properly the functions for FACS data you need:
@param amp_batches_file for example - “annotations/amp_batch2index_file_name.csv”
@param wells_cells_file for example - “annotations/wells_cells.txt”
You also need to save in a specific place all the FACS csv files (for example: “facs_data/csv_files/”)

amp_batches_file = "annotations/amp_batch2index_file_name.csv"
wells_cells_file = "annotations/wells_cells.txt"
amp_batches = read.csv(amp_batches_file,stringsAsFactors = FALSE)

inx = lapply(amp_batches[1:8,"csv_file"],function(x) read.csv(paste0("facs_data/csv_files/",x),skip = 15,stringsAsFactors = FALSE))
names(inx) = amp_batches[1:8,"Amp.Batch.ID"]
inx = lapply(inx, function(x) {
  x[,c(2:ncol(x))] = apply(x[,c(2:ncol(x))],2, function(y) as.double(sub(",","",y)))
  return(x)})

well_cells = read.delim(wells_cells_file,stringsAsFactors = FALSE)
well_cells = subset(well_cells,well_cells$Amp_batch_ID %in% amp_batches[,"Amp.Batch.ID"] )

m = lapply(names(inx),function(x) merge(inx[[x]],well_cells[well_cells$Amp_batch_ID == x, c(1,2)],all.x=TRUE,by.x="Well",by.y="well_coordinates")) 
names(m) = names(inx)
m = lapply(m, function(x){
  rownames(x) = x$Well_ID
  x = x[-ncol(x)]
  x = x[order(x$All.Events.Time.Mean),]
  return(x)
})

library(plyr)
facs = ldply(lapply(m,function(x){
  colnames(x) = sub("All.Events.","",colnames(x))
  colnames(x) = sub(".Mean","",colnames(x))
  colnames(x)[12] = "ACSA-2"
  x$Well_ID = rownames(x)
  return(x)}),.id = "amp_batch_ID",data.frame)
rownames(facs) = facs$Well_ID
facs = facs[-ncol(facs)]

mc = scdb_mc(new_id)
color2name = mc@color_key$group
names(color2name) = mc@color_key$color

df = data.frame(Well_ID = names(mc@mc),mc=as.integer(mc@mc),color=mc@colors[as.integer(mc@mc)], cell_type = color2name[mc@colors[as.integer(mc@mc)]])
df = merge(df,facs,by.x="Well_ID",by.y=0)

library(flowCore)
library(ggcyto)
library(scales)


##### x and y axes with logicle transformation ##### 

pairs = list(c("FSC.A","SSC.A"),c("FSC.A","DAPI.Pacific.Blue.A"),c("CD45.APC.A","CD11b.BV605.A"),c("CD45.APC.A","ACSA.2"))
t = 1e6
l = 4.5
m = 4.5
for(pair in pairs){
  g = ggplot(df,aes_string(x = pair[1],y = pair[2])) +  geom_jitter(size = 2, aes(color = cell_type)) +
    scale_color_manual(values = names(color2name)  ,breaks= as.character(color2name),limits = as.character(color2name),labels = as.character(color2name)) + 
    scale_x_logicle(t = 10^ceiling(log10(max(df[[pair[1]]]))),m=m,w = (l-log10(t/abs(min(df[[pair[1]]]))))/2) +
    scale_y_logicle(t = 10^ceiling(log10(max(df[[pair[2]]]))),m=m ,w=(l-log10(t/abs(min(df[[pair[2]]]))))/2) +
    theme_classic() +
    theme(plot.title = element_text(size = 18, face = "bold"),
          legend.title =element_blank() ,legend.text=element_text(size = 14,family="Sans"),legend.box.spacing = unit(0.5,"in"),legend.key.width = unit(0.7,"in"),
          legend.position = "bottom",legend.box="horizontal", legend.background = element_rect(colour="#E0E0E0", size=.5, linetype="twodash")) +
    guides(color = guide_legend(title.position="top", title.hjust = 0.5))
  ggsave(paste0("results/",pair[1],"_",pair[2],".png"),width = 12, height = 12)
}
  
g = ggplot(df,aes_string(x = "CD45.APC.A",y = "ACSA.2")) +  geom_jitter(size = 2, aes(color = cell_type)) +
  scale_color_manual(values = names(color2name)  ,breaks= as.character(color2name),limits = as.character(color2name),labels = as.character(color2name)) + 
  scale_x_logicle(t = 10^ceiling(log10(max(df[[pair[1]]]))),m=m,w = (l-log10(t/abs(min(df[[pair[1]]]))))/2) +
  scale_y_logicle(t = 10^ceiling(log10(max(df[[pair[2]]]))),m=m ,w=(l-log10(t/abs(min(df[[pair[2]]]))))/2) +
  theme_classic() + facet_wrap(~amp_batch_ID,nrow = 2) +
  theme(plot.title = element_text(size = 18, face = "bold"),
        legend.title =element_blank() ,legend.text=element_text(size = 14,family="Sans"),legend.box.spacing = unit(0.5,"in"),legend.key.width = unit(0.7,"in"),
        legend.position = "bottom",legend.box="horizontal", legend.background = element_rect(colour="#E0E0E0", size=.5, linetype="twodash")) +
  guides(color = guide_legend(title.position="top", title.hjust = 0.5))
ggsave(paste0("results/","CD45.APC.A_ACSA.2_amp_batch.png"),width = 22, height = 12)

We can see that cells assigned to astrocytes have also ACSA.2 records higher than 10^3 and CD45 records lower than 10^2

CD45.APC.A_vs_ACSA.2

CD45.APC.A_vs_ACSA.2

CD45.APC.A_vs_ACSA.2_by_Amp_Batch_ID

CD45.APC.A_vs_ACSA.2_by_Amp_Batch_ID